Chapter 12: Web Usage Mining 12.1 Data Collection and Pre-processing 12.1.1 Sources and Types of Data Data Collection and Pre-processing 453 Fig. 3. Portion of a Typical Server Log
نویسنده
چکیده
With the continued growth and proliferation of e-commerce, Web services, and Web-based information systems, the volumes of clickstream and user data collected by Web-based organizations in their daily operations has reached astronomical proportions. Analyzing such data can help these organizations determine the lifetime value of clients, design cross-marketing strategies across products and services, evaluate the effectiveness of promotional campaigns, optimize the functionality of Web-based applications, provide more personalized content to visitors, and find the most effective logical structure for their Web space. This type of analysis involves the automatic discovery of meaningful patterns and relationships from a large collection of primarily semi-structured data, often stored in Web and applications server access logs, as well as in related operational data sources. Web usage mining refers to the automatic discovery and analysis of patterns in clickstream and associated data collected or generated as a result of user interactions with Web resources on one or more Web sites [114, 505, 387]. The goal is to capture, model, and analyze the behavioral patterns and profiles of users interacting with a Web site. The discovered patterns are usually represented as collections of pages, objects, or resources that are frequently accessed by groups of users with common needs or interests. Following the standard data mining process [173], the overall Web usage mining process can be divided into three interdependent stages: data collection and pre-processing, pattern discovery, and pattern analysis. In the pre-processing stage, the clickstream data is cleaned and partitioned into a set of user transactions representing the activities of each user during different visits to the site. Other sources of knowledge such as the site content or structure, as well as semantic domain knowledge from site ontolo-gies (such as product catalogs or concept hierarchies), may also be used in pre-processing or to enhance user transaction data. In the pattern discovery stage, statistical, database, and machine learning operations are performed to obtain hidden patterns reflecting the typical behavior of users, as well as summary statistics on Web resources, sessions, and users. In the final stage of the process, the discovered patterns and statistics are further processed, filtered, possibly resulting in aggregate user models that can be 450 Chapter 12: Web Usage Mining Fig. 1. The Web usage mining process used as input to applications such as recommendation engines, visualiza-tion tools, and Web analytics and report generation tools. The overall process is depicted in Fig. 1. In …
منابع مشابه
Analyzing the User Navigation Pattern from Weblogs Using Data Pre-processing Technique
In the real world, lot of users attracted towards online shopping, so lots of transactions are going on in the websites. A weblog contains series of entries updating frequently by the user while accessing the website. Based on the user interest, it can be classified as related and unrelated data. The related data can be considered as success response, but the unrelated data can be considered as...
متن کاملPre Processing of Web Logs – An Improved Approach For E-Commerce Websites
In this paper an improved approach for pre processing of web logs data has been proposed and evaluated so that it can be applied for web logs of e-commerce web sites. The resultant web log data after these pre processing steps can be used for further pattern discovery and analysis that helps to provide useful prediction to enhance e-commerce. Ideally, the input for the Web Usage Mining process ...
متن کاملData Pre - processing on Web Server Logs for Generalized Association Rules Mining Algorithm
Web log file analysis began as a way for IT administrators to ensure adequate bandwidth and server capacity on their organizations website. Log file data can offer valuable insight into web site usage. It reflects actual usage in natural working condition, compared to the artificial setting of a usability lab. It represents the activity of many users, over potentially long period of time, compa...
متن کاملAn Overview of Preprocessing of Web Log Files for Web Usage Mining
With the Internet usage gaining popularity and the steady growth of users, the World Wide Web has become a huge repository of data and serves as an important platform for the dissemination of information. The users’ accesses to Web sites are stored in Web server logs. However, the data stored in the log files do not present an accurate picture of the users’ accesses to the Web site. Hence, prep...
متن کاملAnalysis of Pre-processing and Post-processing Methods and Using Data Mining to Diagnose Heart Diseases
Today, a great deal of data is generated in the medical field. Acquiring useful knowledge from this raw data requires data processing and detection of meaningful patterns and this objective can be achieved through data mining. Using data mining to diagnose and prognose heart diseases has become one of the areas of interest for researchers in recent years. In this study, the literature on the ap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006